Performance Evaluation for Question Classification by Tree Kernels using Support Vector Machines

نویسنده

  • Muhammad Arifur Rahman
چکیده

Question answering systems use information retrieval (IR) and information extraction (IE) methods to retrieve documents containing a valid answer. Question classification plays an important role in the question answer frame to reduce the gap between question and answer. This paper presents our research work on automatic question classification through machine learning approaches. We have experimented with machine learning algorithms Support Vector Machines (SVM) using kernel methods. An effective way to integrate syntactic structures for question classification in machine learning algorithms is the use of tree kernel (TK) functions. Here we use SubTree kernel, SubSet Tree kernel with Bag of words and Partial Tree kernels. Trade-off between training error and margin, Costfactor and the decay factor has significant impact when we use SVM for the above mentioned kernel types. The experiments determined the individual impact for Trade-off between training error and margin, Cost-factor and the decay factor and later the combined effect for Trade-off between training error and margin, Cost-factor. For each kernel types depending on these result we also figure out some hyper planes which can maximize the performance. Based on some standard data set outcomes of our experiment for question classification is promising. Index Terms — Precision, Recall, kernel, SubSet Tree, SubTree, Partial Tree, SVM, Question Classification, Question Answering.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater

The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...

متن کامل

کاربرد الگوریتم‌های داده‌کاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد

Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...

متن کامل

Separating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir

The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...

متن کامل

Fault Detection and Classification in Double-Circuit Transmission Line in Presence of TCSC Using Hybrid Intelligent Method

In this paper, an effective method for fault detection and classification in a double-circuit transmission line compensated with TCSC is proposed. The mutual coupling of parallel transmission lines and presence of TCSC affect the frequency content of the input signal of a distance relay and hence fault detection and fault classification face some challenges. One of the most effective methods fo...

متن کامل

Detection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms

acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JCP

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2010